Skip to content

Conversation

@adriangb
Copy link
Contributor

Related issues

Closes #19894. I think this will also help with #19387 as well.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate proto Related to proto crate labels Jan 29, 2026
Comment on lines +1757 to +1758
// Wrap with ProjectionExec if projection is present and differs from scan output
// (either non-identity, or fewer columns due to filter-only columns)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea for #19387 is that we might be able to push down trivial expressions here, thus avoiding the need for any physical optimizer changes/rules.

Comment on lines +2166 to +2170
LogicalPlan::Filter(filter) => {
// Split AND predicates into individual expressions
filters.extend(split_conjunction(&filter.predicate).into_iter().cloned());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can drop this since filters are effectively pushed into TableScan now?

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 29, 2026
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Jan 29, 2026
@adriangb adriangb marked this pull request as ready for review January 29, 2026 21:21
@adriangb
Copy link
Contributor Author

@kosiew would you be open to reviewing this?

Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an issue and stopping as there are other CI issues.

Comment on lines +2123 to +2125
LogicalPlan::TableScan(scan) => {
// Also extract filters from TableScan (where they may be pushed down)
filters.extend(scan.filters.iter().cloned());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a problem in scenarios like UPDATE target FROM source ... where the input plan contains TableScan nodes for both target and source. This function will extract filters from both tables and attempt to apply them to the target table (after stripping qualifiers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. Is the problem the stripping of qualifiers? It seems to me that the current implementation is generally fragile: we should be taking into account the join aspect and only collect from subtrees of the join. In particular:

> explain format indent UPDATE "trg" SET col = 1 FROM src WHERE trg.id = src.id AND src.type = 'active' AND trg.id > 100;
+---------------+-----------------------------------------------------------------------+
| plan_type     | plan                                                                  |
+---------------+-----------------------------------------------------------------------+
| logical_plan  | Dml: op=[Update] table=[trg]                                          |
|               |   Projection: trg.id AS id, Utf8View("1") AS col                      |
|               |     Inner Join: trg.id = src.id                                       |
|               |       Filter: trg.id > Int32(100)                                     |
|               |         TableScan: trg projection=[id]                                |
|               |       Projection: src.id                                              |
|               |         Filter: src.type = Utf8View("active") AND src.id > Int32(100) |
|               |           TableScan: src projection=[id, type]                        |

It seems to me that we should do something like:

  1. Find the table scan corresponding to the table being updated.
  2. Walk up the tree until we hit a join / subquery (?) / other blocker.
  3. Collect all filters in that subtree (ideally once this PR is across the line they've all been pushed into the TableScan so that becomes trivial)

I don't know how the DML stuff is supposed to handle more complex cases involving EXISTS, etc.

@adriangb
Copy link
Contributor Author

Found an issue and stopping as there are other CI issues.

Thank you, I will look into this.

I'm struggling with the CI issue: https://github.com/apache/datafusion/actions/runs/21495116091/job/61927800829?pr=20061#step:4:7993

It seems like the only diff is the inclusion of a backtrace. I can reproduce locally if I run with RUST_BACKTRACE=1 but without it no failure. Which is confusing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unified TableScan.filters

2 participants